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Abstract — In  this  work,  a  general  information  fusion  problem  is 
formulated  as  an  optimisation  protocol  in  the  space  of  probability 
measures  (i.e.  the  so-called  Wasserstein  metric  space).  The  high- 
level  idea  is  to  consider  the  data  fusion  result  as  the  probability 
measure  that  is  closest  to  a  given  collection  of  input  measures  in 
the  sense  that  it  will  minimise  the  (weighted)  Wasserstein  distance 
between  itself  and  the  inputs.  After  formulating  the  general 
information  fusion  protocol,  we  consider  the  explicit  computation 
of  the  fusion  result  for  two  special  scenarios  that  occur  frequently 
in  practical  applications.  Firstly,  we  show  how  one  can  compute 
the  general  outcome  explicitly  with  two  Gaussian  input  measures 
(ignoring  any  correlation).  We  then  examine  the  consistency  of 
this  result  for  the  scenario  in  which  the  two  Gaussian  inputs 
have  an  unknown  (but  possibly  non-zero)  correlation.  Secondly, 
we  show  how  one  can  compute  the  general  fusion  result  explicitly 
given  two  randomly  sampled  (discrete)  empirical  measures  which 
typically  have  no  common  underlying  support.  Data  fusion  with 
empirical  measures  as  input  has  wide  applicability  in  applications 
involving  Monte  Carlo  estimation  etc. 

I.  Introduction 

In  this  work,  a  general  information  fusion  problem  is  for¬ 
mulated  as  an  optimisation  protocol  in  the  space  of  probability 
measures  (i.e.  the  so-called  Wasserstein  metric  space  [1]).  The 
high-level  idea  is  to  consider  the  data  fusion  outcome  as  the 
probability  measure  that  is  closest  to  a  given  collection  of 
input  measures  in  the  sense  that  it  will  minimise  the  (weighted) 
Wasserstein  distance  between  itself  and  the  inputs. 

The  classical  way  to  combine  continuous  conditional  mea¬ 
sures  is  to  use  Bayes  rule,  which  (roughly)  involves  mul¬ 
tiplying  the  measures  together  and  then  normalizing  via  an 
integral  operation.  Alternatively,  one  may  find  the  weighted 
average  of  all  the  individual  probability  measures  (i.e.  sum 
the  weighted  measures  (with  the  weights  summing  to  1)  and 
take  this  as  the  combined  belief);  i.e.  this  is  just  a  probability 
mixture  (often  referred  to  a  linear  opinion  pool).  Separate 
again,  one  may  take  the  weighted  average  of  the  logarithm  of 
the  individual  measures  and  then  take  the  common  belief  to  be 
the  exponentiation  of  this  weighted  average  (often  referred  to 
as  a  log-linear  opinion  pool).  See  [2]-[5]  for  a  survey  of  these 
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three  well-studied  solutions  and  variations.  Other  variations 
are  discussed  in  [6],  [7], 

Other  non-probabilistic  (or  generalised)  approaches  to  infor¬ 
mation  fusion  and  inference  such  as  interval  calculus,  fuzzy 
logic,  Dempster-Shafer  etc.  [6],  [7]  are  not  discussed  here. 
Probability  measures  as  arising  in  topics  like  random-set 
theory  etc.  [6]  are  also  not  discussed  but  presumably  can  be 
accommodated  via  the  proposed  framework  for  information 
fusion. 

One  novelty  of  the  proposed  general  information  fusion 
protocol  is  that  it  draws  on  a  very  general  formulation 
in  which  input  measures  of  a  general  type  can  be  natu¬ 
rally  accommodated;  e.g.  we  will  show  how  one  can  deal 
straightforwardly  with  empirical  (discrete,  randomly  sampled) 
measures.  Moreover,  the  formulation  introduced  appears  to 
be  a  mathematically  intuitive  approach  to  information  fusion 
drawing  on  the  rigorous  foundation  of  the  Wasserstein  metric 

[1] .  The  general  algorithm  introduced  here  for  information 
fusion  also  lends  itself  neatly  to  distributed  computation  as 
outlined  in  separate  work  [8]. 

A.  Direct  Information  Fusion  of  Empirical  Measures 

Note  that  one  downside  of  existing  probabilistic  methods 

[2] -[5]  for  combining  probability  measures  is  that  they  do 
not  naturally  allow  one  to  consider  the  direct  combination  of 
randomly  sampled  (discrete)  measures  [9]  and/or  non-standard 
probability  measures.  This  limitation  follows  because  these 
methods  inherently  act  on  measures  as  if  they  are  ‘functions’ 
with  common  support.  This  assumption  of  a  common  support 
is  extremely  limiting.  For  example,  the  multiplication  required 
by  Bayes  rule  is  simply  impossible  to  do  (directly)  when 
dealing  with  empirical  measures. 

Traditionally,  fusion  of  empirical  measures  typically  in¬ 
volves  temporarily  transforming  the  empirical  measure  back  to 
a  continuous  measure  using  a  so-called  Kernel  method  [10]  at 
which  point  classical  fusion  results  [2],  [6],  [7],  [11]  apply.  The 
Kernel-based  approaches  typically  scale  poorly  (without  some 
form  of  clustering  of  the  components)  and  may  also  perform 
poorly  when  the  underlying  sampled  measures  do  not  ‘overlap’ 
sufficiently  well.  Its  worth  noting  the  computationally  efficient 


approximations  for  sampling  the  output  of  the  product  of  input 
measures,  e.g.  typically  two  Gaussian  mixture  inputs,  which  do 
not  require  an  explicit  computation  of  the  product  itself  [12], 
[13].  Such  methods  naturally  approximate  the  Kernel  method 
approach  to  fusion  with  empirical  inputs. 

An  advantage  of  the  proposed  method  for  information  fusion 
is  that  it  completely  avoids  the  multiplication  or  summation 
of  measures,  a  task  that  is  impossible  to  do  when  dealing 
with  randomly  sampled  measures.  Instead,  we  specialise  the 
general  information  fusion  protocol  proposed  in  this  work 
and  provide  a  directly  computable  protocol  for  the  direct 
information  fusion  of  two  empirical  (discrete)  measures.  This 
protocol  works  by  finding  a  discrete  measure  that  is  closest  in 
the  sense  of  the  Wasserstein  metric  [14]  to  the  two  empirical 
input  measures.  The  real  novelty  of  this  presentation  lies  in 
the  potential  applications  of  such  fusion  in  fields  like  Monte 
Carlo  estimation  [9]  etc. 

B.  Gaussian  Information  Fusion  in  the  Presence  of  Unknown 
Correlation 

In  classical  Gaussian  information  fusion  the  optimal  solu¬ 
tion  (in  terms  of  minimising  the  variance  of  the  fusion  result) 
is  straightforwardly  computed  when  the  correlation  between 
the  input  measures  is  known  [7].  In  the  case  in  which  the 
correlation  is  unknown  one  may  employ  an  algorithm  such 
as  covariance  intersection  (Cl)  [15],  [16]  which  ensures  the 
output  is  consistent  in  the  sense  that  its  variance  estimate  is 
never  less  than  the  actual  variance  that  would  arise  if  one 
knew  the  correlation  (i.e.  the  fusion  result  is  never  over¬ 
confident  but  rather  typically  conservative  with  respect  to  the 
variance).  The  log-linear  opinion  pool  [5]  described  previously 
is  a  generalisation  of  covariance  intersection  to  arbitrary  input 
measures.  Other  approaches  to  Guassian  information  fusion 
exist  that  consider  the  case  in  which  the  correlation  between 
the  input  measures  is  unknown  [17]— [21], 

We  specialise  the  general  information  fusion  protocol  pro¬ 
posed  in  this  work  and  provide  a  directly  computable  protocol 
for  the  information  fusion  of  two  Gaussian  input  measures. 
This  result  produces  a  Gaussian  fusion  output  that  is  the 
closest  Gaussian  measure  in  the  (weighted)  Wasserstein  metric 
to  the  two  input  measures.  The  computation  of  this  output 
does  not  take  into  account  any  dependence  between  the  inputs 
and  any  such  dependence,  if  it  exists,  does  not  appear  in  the 
computation.  We  then  study  the  consistency  of  this  fusion 
result,  in  the  spirit  of  covariance  intersection  [15],  when  the 
inputs  may  or  may  not  be  independent. 

C.  Organisation 

In  the  next  section  a  general  information  fusion  problem 
is  formulated  as  an  optimisation  problem  in  the  space  of 
probability  measures  (i.e.  the  so-called  Wasserstein  metric 
space  [1]).  The  high-level  idea  is  to  consider  the  data  fusion 
outcome  as  the  probability  measure  that  is  closest  to  the 
given  collection  of  input  measures  in  the  sense  that  it  will 
minimise  the  (weighted)  Wasserstein  distance  between  itself 
and  the  collection  of  inputs.  In  the  subsequent  two  sections 


we  consider  the  explicit  computation  of  the  data  fusion  result 
for  two  special  scenarios  that  occur  frequently  in  practical 
applications.  Firstly,  in  Section  III  the  general  information 
fusion  solution  is  computed  explicitly  for  two  Gaussian  input 
measures  (ignoring  any  correlation)  and  then  the  consistency 
of  this  result  is  examined  for  the  scenario  in  which  the  two 
Gaussian  inputs  have  an  unknown  (but  possibly  non-zero) 
correlation.  Then  in  Section  IV  the  general  fusion  problem 
is  reduced  to  one  involving  two  randomly  sampled  (discrete) 
empirical  measures  which  typically  have  no  common  under¬ 
lying  support.  A  conclusion  is  given  in  Section  V. 

II.  Information  Fusion  via  the  Wasserstein 
Barrycenter 

Consider  a  collection,  i  £  V  =  {l,...,n}  of  Radon 
probability  measures  pt  defined  on  the  Borel  sets  of 
with  0  <  to  <  oo  where  d,  :  Rm  x  Rm  — >-  M.  is  the  usual 
Euclidean  distance.  Define  the  space  of  all  such  measures  on 
by  il(Rm)  and  the  subset  of  all  such  measures  with 
bounded,  finite,  pth  moment  by  ilp(Rm)  for  some  suitably 
small  1  <  p  <  oo.  That  is,  ilp  is  the  collection  of  probability 
measures  on  the  Borel  sets  of  (Rm,d)  such  that 

f  d(x,x0)pd^(x)  <  oo  (1) 
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for  all  bounded  x  £  Rm  and  a  given  x0  £  Km. 

One  can  then  associate  with  the  space  ilp  a  metric  £p  : 
itp  x  ilp  — >■  R.  defined  by 

£p(/j,i,fXj)p  =  inf  [  d(xi,xi)pd7(x,,xJ)  (2) 

where  T(pi,pj)  denotes  the  collection  of  all  measures  on 
Rm  x  Rm  with  marginals  pi  and  pj  on  the  first  and  second 
factors  respectively. 

Suppose  one  wants  to  compute 

v  =  }n{'^2witp(z,Pi)p  (3) 

26  p  iev 

where  Wi  £  (0, 1)  and  Y2jev  wo  =  1-  We  neglect  the  trivial 
case  in  which  Wift)  =  1  for  one  i. 

This  operation  is  a  form  of  information  fusion  in  the  sense 
that  we  are  trying  to  find  a  measure  that  is  the  ‘closest’ 
measure  to  a  collection  of  given  input  measures  (in  this 
case  in  the  sense  of  the  weighted  Wasserstein  distance).  This 
formulation  lends  itself  naturally  to  distributed  implementation 
and  the  distributed  fusion  version  of  this  algorithm  along  with 
convergence  results  are  introduced  in  [8]. 

The  Wasserstein  metric  captures  the  error  in  the  expected 
value  of  a  class  of  functions  due  to  the  approximation  of  one 
measure  by  another  [22].  Thus,  the  fusion  result  can  be  viewed 
as  a  measure  with  an  expected  value  (for  a  class  of  function) 
that  is  minimally  different  (simultaneously)  from  the  same 
expectation  of  each  input  measure  (in  a  weighted  sense). 

In  the  remainder,  we  consider  the  computation  of  (3)  for 
two  important  (application  heavy)  special  cases. 


III.  General  Linear  Information  Fusion  of  Two 
Gaussian  Estimates 

We  consider  two  (random  variable)  estimates  a  ~ 
Af(c*,Paa)  and  b  ~  Af(c*  ,Pbb)  of  some  fixed  parameter 
c*  £  R"\  0  <  m  <  oo.  The  estimation  error  of  a  and  b  are 
defined  by  the  random  variables 

a  =  a  —  c*  ,  b  =  b  —  c*  (4) 

where,  in  this  case, 

E[a]  =  0  ,  Paa  =  E[aaT]  >  0  (5) 

E[b]  =  0  ,  Pbb  =  E[bbT]  >  0  (6) 

Although  the  true  values  Paa  and  Pbb  may  not  be  known, 
consistent  approximations  Paa  and  Pbb  are  assumed  available 
where  1 

Paa  >  Paa  .  P bb  >  P bb  O) 

The  cross-correlation  matrix  between  the  two  estimates  is 
denoted  by  P„/,  and  is  defined  by 

Pab  =  E[(a  —  c*)(b  —  c*)T]  =  E[abT]  (8) 

This  matrix  may  be  known  or  unknown  and  may  even  be  zero 
in  some  applications. 

Let  c  ~  Af(c* ,  PCc)  denote  a  third  estimate  of  c*  obtained 
via  a  linear  combination  of  a  and  b.  That  is 

c  =  Kia  +  K2b  (9) 

where  a,  b  £  R”  and  Ki,K2  €  R"x".  The  error  in  this 
estimate  is 

c  =  c  c*  (10) 

and  obeys  E[c]  =  0  when  +  K2  =  I. 

The  true  covariance  Pcc  =  E[ccT]  is  calculated  by 

Pcc  =  K.PaaKj  +  K2PbbKj 

+KiPabKj  +  K2PbaK7  (11) 

and  calculation  of  this  term  requires  Pab  =  P(Ja  be  known 
(when  it  is  non-zero). 

We  are  mainly  interested  in  the  construction  of  an  estimator 
c  defined  by  some  Ki  and  K2  and  also  an  estimate  Pcc  of  Pcc 
when  the  cross-correlation  Pab  is  non-zero  but  unknown.  We 
are  further  interested  in  certain  properties  of  the  resulting  Pcc. 
In  particular,  we  are  interested  in  the  property  of  consistency 

Pcc  >  Pcc  (12) 

where  Pcc  is  given  by  (11)  which  depends  on  the  cross¬ 
correlation  Pab  or  some  estimation  thereof  which  we  assume 
unavailable. 

Definition  1.  Suppose  P aa  and  Pbb  are  given  along  with 
Pab  =  Pba-  Suppose  Pab  =  Pja  A  non-zero.  Suppose  an 
estimator  c  for  c*  is  given  in  the  form  (9).  The  true  covariance 
of  c  is  denoted  by  Pcc.  An  estimate  Pcc  of  Pcc  is  said  to  be 

'This  inequality  is  in  the  sense  of  matrix  positive  definiteness. 


consistent  if 

Pcc  >  Pcc  (13) 

where  Pcc  is  defined  by  (lll- 

lt  is  often  true  that  ignoring  the  correlation  Pnb  when  fusing 
a  and  b  can  lead  to  overly  confident  results;  i.e.  the  resulting 
estimate  of  Pcc  will  be  inconsistent  as  per  Definition  1.  Some 
algorithms,  such  as  covariance  intersection  (Cl),  see  [15],  [16], 
on  the  other  hand  are  designed  to  generate  consistent  estimates 
when  the  cross-correlation  is  unknown.  In  many  cases,  the 
resulting  estimators  are  considerably  conservative. 

A.  Information  Fusion  in  the  Wasserstein  Space  of  Gaussian 
Probability  Measures 

Consider  two  Gaussian  probability  measures  pn,  i  £  {o,  6} 
defined  on  the  Borel  sets  of  ( R.m,d )  and  suppose  that  pa 
and  pb  admit  Gaussian  densities  of  the  form  Af(c* .  Paa)  and 
■Aftc*,  Pbb)  given  some  fixed  parameter  c*.  Define  the  space 
of  all  such  measures  on  (Rm,  d)  with  bounded  second  moment 
by  ©2(Rm)  c  il2(Rm). 

Suppose  one  wants  to  compute 

pc  =  inf  wi£p(z,  pa)p  +  w2iP{z,  pb)p  (14) 

zS@p 

where  u1,;  £  (0, 1)  and  (w\  +  w2)  =  1.  This  operation  is 
clearly  a  special  case  of  (3)  where  p  =  2,  |V|  =  2  and  we 
restrict  ourselves  to  the  space  ©2(Rm)  C  il2(Rm).  Note  that 
this  operation  does  not  consider  any  dependence  between  the 
inputs. 

Theorem  1  (  [14],  [23]).  Suppose  that  pa  and  pb  admit 
Gaussian  probability  densities  of  the  form  Af  (c* ,  Paa)  and 
AA(c*,  Pfcb)  given  some  fixed  parameter  c*.  Then  pc  defined 
as  the  solution  to 

Pc =  inf  (1  -  w)i2(z,pa)2  +wt2(z,pb)'2  (15) 
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with  w  £  (0, 1)  exists  and  is  unique.  Moreover,  pc  is  in  ©2 
and  admits  a  Gaussian  density  of  the  form  Af(c,  PCc)  where 

c  =  (1  —  w)c*  +  wc*  =  c*  (16) 

Pcc  =  ((1  -  w) I  +  roll)  Paa  ((1  -  w)I  +  Ulft)  (17) 

where  ft  =  P^/2(P^/2PaaP^/2)-7P^2. 

We  now  have  the  following  information  fusion  estimate 
(termed  the  Wasserstein-Gaussian  Information  Fusion  Al¬ 
gorithm): 

c  =  (1  —  w)  a  +  w  b 

Pcc  =  ((1  —  tn)I  +  trll)  P aa  ((1  —  w)I  +  it’ll) 

II  =  P^jP^PaaPl^Pti2,  W£(0,1) 

This  algorithm  is  motivated  by  the  fact  that  it  defines 
the  best  approximation  to  the  measure  that  is  closest  in  the 
weighted  Wasserstein  sense  to  two  input  measures  defined 
exactly  by  the  distributions  of  a  and  b.  The  output  measure  is 
thus  a  kind  of  weighted  average  between  two  input  measures 


in  the  sense  of  a  probabilistically  justified  and  intuitive  metric 
(i.e.  the  Wasserstein  metric).  It  is  appealing  since  finding  a 
measure  that  is  close  to  two  input  measures  in  this  sense  does 
not  depend  on  the  correlation  Pab  which  is  often  assumed 
unknown  (but  possibly  non-zero). 

Practically,  this  estimation  algorithm  is  taking  two  Gaussian 
(random  variable)  inputs  a  and  b  and  producing  a  third 
(random  variable)  output  c  =  (1  —  w)a  +  wh  which  has  mean 
c*  and  true  covariance 

Pcc  =  (1  -  w)2 Pao  +  W2Pbb 

+w(l-w)Pab  +  w(l-w)Pba  (18) 

from  (11).  Of  course,  given  c  =  (1  —  w) a  +  u>b  (or  any 
other  estimator  of  the  form  (9))  it  is  not  possible  to  compute 
the  true  covariance  (or  use  (11)  at  all;  say  with  estimates 
Paa  >  Paa  etc.)  when  Pab  is  unknown  and  non-zero.  Hence 
the  construction  of  Pcc  as  a  sole  function  of  Paa  and  Pbb. 
Of  course,  we  want  Pcc  to  be  a  good  representation  of  Pcc 
but  further  it  is  often  accepted  that  one  also  wants  consistency 
Pcc  >  Pcc  so  that  the  estimator  is  not  over-confident. 

We  now  have  the  following  main  result. 

Proposition  1.  Let  w  €  (0, 1)  and  suppose  an  estimator  c  for 
c*  is  given  by  the  Wasserstein-Gaussian  Information  Fusion 
Algorithm  along  with  the  associated  covariance  matrix  Pcc. 
Computation  of  Pcc  uses  Paa  >  Paa  and  Pbb  >  Pbb  which 
are  guaranteed  consistent.  Then 

Pcc  >  Pcc  (19) 

is  guaranteed  if  and  only  if 

(PaaPbfc)1/2  +  (PbbP aa)1/2  >  Pab  +  Pbo  (20) 

where  Pcc  is  the  true  covariance  of  c  given  by  (18). 

Before  proceeding  to  the  proof  we  need  the  following 
lemmas. 

Lemma  1.  Let  II  =  P^fP^P^Pjf  )-fpf  The  fol¬ 
lowing  statements  hold: 

11  TT  —  P_1(2(P1(2Pi,i,P1(215P_1(2 

1)  A-*-  —  aa  aa  ±  bb^  aa  )  z  ±  aa 

2)  npaan  =  pbb 

Proof:  Matrices  in  the  form  of  II  are  well  studied  and 
actually  correspond  to  the  matrix  geometric  mean  of  Pbb  and 
Pfa-  The  identities  are  found  in  [24],  ■ 

Lemma  2.  Let  II  =  P”1 /2(Pl^PbhP^2)^ Pfl /2.  The 
following  statements  hold: 

1)  nPaa  =  (PhtPaa)1/2 

2)  paan  =  (Paapbb)1/2 

Proof:  With  LE  as  defined  we  use  the  following  identities 
n  =  (PbbPaay/2Pfi  =  P"1(PaaPbb)1/2  as  found  in  [24], 

■ 

We  now  proceed  to  the  proof  of  the  main  proposition. 
of  Proposition  1:  For  the  remainder  of  the  proof  let 

n  =  P&62(P6b/2p^Pbb2)_2P^/2.  Now  by  expanding  the 


quadratic  form  of  Pcc  we  get 

Pcc-Pc  c  =  U>2IIPaaII  +  (1  -  tu)2Paa 
+w(i  -  w)  npaa  +  paaii] 

-w(  1  -  w)  Pab  +  Pba 
-w2Pbb  -  (1  -  tn)2Paa  (21) 
From  Lemma  1  we  have  IIPaaII  =  Pbb  and  thus 
Pcc  -  Pcc  >  IV (1  -  w)  [LEPaa  +  PaaII] 

-w{l-w)  Pab  +  Pba 

=  w(l-w)  npaa  +  PaaII  -  Pab  -  Pba  (22) 

using  the  relations  Paa  >  Paa  and  Pbb  >  Pbb. 

It  then  follows  that  Pcc  —  Pcc  >  0  if  and  only  if 

nPaa  +  PaoII  —  Pab  Pba  >  0  (23) 

Applying  Lemma  2  it  follows  that  Pcc  —  Pcc  >  0  if  and  only 
if 

(PattPbb)1/2  +  (PbbPaa)1/2  -  Pab  -  Pba  >  0  (24) 

and  to  guarantee  consistency  for  all  possible  Paa  >  Paa  and 
Pbb  >  Pbb  we  set  Paa  =  Paa  and  Pbb  =  Pbb  and  the  proof 
is  complete.  ■ 

One  may  question  whether  the  inequality 

(PQQPbb)1/2  +  (PbbPaa)1/2  -  Pab  Pba  >  0  (25) 

required  by  the  proposition  is  automatically  satisfied.  Indeed, 
the  authors  initially  suspected  this  may  be  the  case.  We  explore 
this  idea  further  now.  We  note  the  following  lemma. 

Lemma  3.  Let  Paa  >  0,  Pbb  >  0  and  Pab  be  defined  as 
before  and  given  and  then  define 

M  =  ?:oa  Z,ab  (26) 

Pba  Pbb 

noting  M  >  0.  Then  there  exists  a  contractive  matrix  S 
defined  to  obey  I  —  SST  >  0  such  that  Pob  +  Pbo  = 

pl/2opl/2  ,  pl/2nT  pi/2 
r  aa  ^rbb  -I-  trbb  3  raa  • 

Proof:  See  [24],  ■ 

This  lemma  may  be  used  to  generalise  the  notion  of  a 
correlation  coefficient  found  in  the  scalar  case. 

Corollary  1.  Let  w  G  (0, 1)  and  suppose  a  scalar  estimator  c 
for  c*eR  is  given  by  the  Wasserstein-Gaussian  Information 
Fusion  Algorithm  along  with  the  associated  variance  pcc. 
Computation  of  pcc  uses  the  scalars  paa  >  paa  and  pbb  >  pbb 
which  are  guaranteed  consistent.  Then  it  always  holds  that 

Pcc  >  Pcc  (27) 

where  pcc  is  the  true  covariance  of  c  given  by  (18). 

Proof:  We  need 

( PaaPbb)1/ 2  +  (PbbPaa)1/2  -  Pab  ~  Pba  >  0  (28) 


and  from  Lemma  3  note  pab  =  pba  =  s(pbbPaa )1^2  f°r  some 

«  e  [—1,1].  ■ 

In  the  scalar  case,  it  is  interesting  to  note  that  pcc  =  pcc 
when  paa  =  paa  and  pbb  =  Pbb  and  the  correlation  coefficient 
is  s  =  1.  Indeed,  the  variance  estimate  becomes  more 
conservative  compared  to  the  underlying  true  variance  as  s 
decreases  from  1. 

In  any  finite  dimension  the  required  condition  for  consis¬ 
tency  can  be  rewritten  using  Lemma  3  as 

(PaaPbb)1/2  +  (PbbP  aa)1/2 

-pya2sp  y2  -  p;i2sTpya2  >  o  (29) 

where  S  is  a  contractive  matrix  obeying  I  —  SST  >  0. 
Unfortunately,  in  higher  dimensional  cases  (beyond  scalars) 
the  inequality  required  for  consistency  may  fail  to  hold  and 
consistency  is  dependent  on  the  relationship  of  the  correla¬ 
tion  in  the  individual  estimates  to  the  underlying  individual 
estimation  covariance  estimates. 

One  further  important  note  is  that  the  relative  difference 
Pcc  —  Pcc  is  independent  of  w  £  (0, 1).  This  is  not  to  say 
that  Pcc  may  not  improve  with  w  but  rather  that  (Pcc  —  Pcc) 
does  not  vary. 

Conjecture  1.  Let  S  obey  I  —  SST  >  0  then 

(PaaPbb)1/2  +  (PbbP  aaP/2 

-pyysp  H2  -  py/2sTpyy  *  o  m 

for  all  Paa  >  0  and  Pbb  >  0. 

This  conjecture  has  been  tested  via  numerous  random 
examples.  The  significance  of  this  conjecture  is  that  it  imme¬ 
diately  implies  that  either  the  proposed  estimator  is  consistent 
(Pcc  -  Pcc)  >  0  or  that  Pcc  f>  Pcc  &  Pcc  Pcc.  In  other 
words,  the  proposed  estimated  covariance  may  be  consistent 
but  it  will  never  be  less  (in  the  positive-semidefinite  sense) 
than  the  true  variance  of  the  estimate.  That  is,  the  conjecture 
implies  Pcc  f  Pcc.  This  is  a  desirable  property  which  implies 
that  the  proposed  covariance  estimate  is  never  overly  confident. 

Finally,  we  note  that  in  practice  one  is  not  typically  inter¬ 
ested  in  the  estimator  for  all  values  w  £  (0, 1)  but  rather  in 
the  estimator  for  a  particular  value  of  w.  We  now  have  the 
following  information  fusion  estimate  (termed  the  Optimal 
Wasserstein-Gaussian  Information  Fusion  Algorithm): 

c  =  (1  —  tnja  +  wJb 

pcc  =  ((i  -  w)i  +  sin)  paa  ((i  -  w)i  +  inn) 

tt  _  pl/2/pl/2p  pi/2'1  — ^pl/2 

11  —  *  bb  'Tbb  *aa*bb  )  *  bb 

w  =  argminulg(01)  tr  P Cc\w=w 

We  do  not  explore  the  details  of  computing  an  optimal  w 
via  a  particular  optimisation  method  here  but  note  simply 
that  numerous  protocols  are  applicable  including  a  simple 
line  search  algorithm.  Other  criterion  beyond  the  trace  could 
also  be  substituted  without  difficulty.  As  noted  previously  the 
particular  value  of  w  has  no  effect  on  the  relative  value  of 
(Pcc  —  Pcc)- 


IV.  Direct  Information  Fusion  of  Empirical 
Measures 

Again  let  |V|  =2  and  suppose  that  an  arbitrary  continuous 
Pi  £  ilp  supported  on  Wn  is  approximated  by  some  discrete 


empirical  measure  /f”  such  that  N  — >  oo  implies  p, 
implies  tp(pf ,  pf)  — >  0  for  all  1  <  p  <  oo  2.  Here 

~N  _  f  c 

^  l  ~  N  ^j=1  °xi 


N 


Pi 


(31) 


where,  for  example,  x(  can  be  considered  a  realisation  of  a 
random  variable  drawn  independently  from  /(,.  Let  f/p  C  itp 
denote  the  space  of  all  such  measures.  We  drop  the  superscript 
N  but  note  that  each  pi ,  Vi  £  {1,  2}  is  defined  by  the  same 

N. 

Let  p  =  2  going  forward.  Consider  the  finite  set  U,  = 
{x\,  ...  ,xlN}  of  sample  points  (indexed  from  1  to  N).  One 
can  write  f2(AiiAi)2  as 

h(Pi,Pj)2=  .  min  ^Ef=iH4-*yH2  (32) 

<TJk,  ke{l,...,N}  JV  <>k 


where  the  minimisation  is  taken  over  all  permutations  rr1  : 
{!)•••)  N}  — >  {1, . . , ,  N}. 

It  now  follows  that 

v  =  inf  ^Mz,pif  =  (33) 


26^2 

*e{  1,2} 

where  Z*  =  {2*,  . . . ,  z*N}  is  a  finite  set. 


For  a  given  Z*  we  can  define  a  vector  z*  = 


,*T 


*  T" 

V 


and 


z  =  argmm 

Z;6Rm 


.  1  v-'iV  || 

yl™jyEH  IN 


+w2  nhn  ^  Ef=r  Ikfc  - - 

For  some  fixed  permutation  erf  Vj,  consider 


(34) 


z*  =  argnnn 

Zi  6Rm 


yEfc=i  ^i\\zk-x\\\2 +x>2\\zk-x2? 
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(35) 

The  solution  to  this  optimisation  problem  is  easily  given  by 

Zk*  =  XJiX1 1  +  U>2X22  ,  Vfc  £  {1,  .  .  .  ,  N} 

u  k  k 

with  coi  £  (0, 1)  and  wi  +  w2  =  1. 

Now  it  then  follows  that 


(36) 


1 


=  argmm  — 

fe{i,2}  N  fc=1 


N 

S[' 


%k*  %  n-1  II 


+  11**.  IP 


—  argmm 


^  je{  1,2}  ^  k=1 


J2  [iM4 


i  -ajE)!!2 


+I42(4i  ~x2f 


(37) 


(38) 


2More  generally,  as  used  here  could  be  any  discrete  normalised 
probability  measure. 


and  then 


z  —  z*  |crj=crj*i  y£{ij2}  (39) 

where  the  arguments  of  the  minimisation  are  the  permutations 

We  then  have  the  following  main  result. 

Theorem  2.  Let  i  £  {1,2}  and  consider  two  measures 

=  t4‘  (40) 

along  with  the  finite  sets  U{  =  {a;},  . . . ,  x^,}  of  sample  points. 
Then  the  globally  optimal  solution  to 

v  =  (41) 

ze  « 2 

is  given  iay 

5  =  (42) 

where 

4  =  tuiai^i.  +  w2x22,  (43) 

fc  fc 

one/  where  of  is  given  as  the  output  of  Algorithm  1. 

Let  ind  (x’j )  =  j.  Then  Algorithm  1  is  a  solution  to  the 
optimisation  problem  in  (38). 


Algorithm  1  Solution  to  the  Optimal  Information  Fusion  Problem 
with  Randomly  Sampled  Empirical  Measures 

1:  Ui=Ui ,  Mi  E  {1,2} 

2:  for  k  =  1  to  N  do 

3:  nearest  =  oo 

4:  for  each  element  x.j  in  U\  do 

5:  for  each  element  x }  in  U2  do 

6:  if  \\xj  —  x}||2  <  nearest  then 

7:  oi*  =  ind  (a;}),  al*  =  ind(x2) 

8:  nearest  =  ||  xj  —  x2||2 

9:  end  if 

10:  end  for 

1 1 :  end  for 

12:  Ui=Ui\{xiai},  Mi  £{1,2} 

13:  end  for 


Proof:  Given  the  derivation  to  (39)  it  remains  only  to 

show  that  Algorithm  1  solves  the  optimisation  problem  given 

in  (38).  Algorithm  1  pairs  points  in  U\  with  points  in  Uj  based 

on  the  (squared)  Euclidean  distance  between  them,  i.e.  the 

outcome  implies  x*i»  and  x2a»  are  the  two  closest  points  to 

each  other  when  picking  disjointly  from  both  U\  and  U-i.  Then 

x},,  and  x22»  are  the  next  two  closest  points  and  so  on. 

Consider  the  problem  in  (38)  and  note  that  the  xf,  and 

x22*  minimises  the  first  k  =  1  summation  term  over  any 
G1 

other  possible  pairing  of  points.  Fixing  these  two  points  it 
then  follows  that  the  pair  x1 1,  and  x22»  minimises  the  second 
k  =  2  summation  term  over  any  other  possible  pairing  of 
points  and  so  on  for  all  k.  Because  of  only  appears  in  one 
summation  term  in  (38)  and  of  course  the  ordering  in  k  is 
unimportant  it  follows  that  this  minimises  the  total  summation. 
This  completes  the  proof.  ■ 

A  more  general  discussion  on  the  existence  and  uniqueness 


of  solutions  to  the  general  optimisation  problem  for  computing 
a  measure  in  Up  that  is  closest  (in  a  weighted  Wasserstein 
sense)  to  a  given  set  of  (possibly  more  than  2)  input  measures 
is  found  in  [14].  For  discrete  input  measures  in  ^2  the 
optimal  solution  is  not  necessarily  unique  (though  it  seems  it 
is  generically  unique).  It  is  worth  noting  [25]  where  a  similar 
mathematical  formalism  is  given  for  computing  a  discrete 
measure  that  is  closest  (in  the  Wasserstein  sense)  to  two 
discrete  input  measures.  No  reference  to  empirical  measures 
is  given  in  [25]  and  so  the  inputs  may  have  common  support. 
The  application  considered  in  [25]  is  one  of  texture  mixing 
in  computer  vision  and  no  relation  to  information  fusion  is 
discussed.  More  efficient  approximations  (or  relaxations)  to 
the  underlying  optimisation  problem  may  also  be  considered 
[26]  to  reduce  the  complexity. 

Moreover,  we  note  that  the  algorithm  presented  here  is  just 
a  direct  approach  for  computing  a  specific  case  of  the  well- 
known  McCann  interpolant  measure  (i.e.  a  measure  that  lies  on 
the  geodesic  connecting  two  input  measures)  when  the  input 
measures  are  discrete  [23], 

The  real  novelty  of  this  presentation  lies  in  the  connection 
to  information  fusion  of  sampled  probability  measures  and 
the  potential  applications  of  such  in  fields  like  Monte-Carlo 
estimation  [9]  etc. 

A.  Illustrative  Examples 

Two  arbitrary  Gaussian  mixture  densities  are  considered 
each  with  8  components.  These  mixtures  are  randomly  sam¬ 
pled  at  N  =  1000  to  generate  p.j,  Mi  £  {1,2}.  The  individual 
Pi,  Mi  £  {1,2}  along  with  the  sample  points  Hi,  Mi  £  {1,2} 
are  shown  in  Figure  1. 


Actual  and  Sampled  Probability  Density  |xn(x) 


Fig.  1.  The  two  continuous  inputs  f ii ,  Vi  E  {1,2}  along  with  the  sample 
points  Hi,  Vi  E  {1,2}.  The  samples  are  shown  at  a  height  of  0.1  for  clarity. 


The  solution  to  the  direct  sampled  fusion  problem 


5  1,2}  ^2(2, /A)2 


(44) 


is  then  computed  according  to  Theorem  2.  For  visualization,  a 
Kernel  method  is  applied  to  the  fused  sample  points  defining 
v  in  order  to  obtain 


v 


1  V— \iV 

NhV 2ir  ^i=l  eXP 


2h2 


(45) 


with  bandwidth  h  =  (^)1^5  where  7  is  the  sample  standard 
deviation  [10].  Both  v  and  u  are  displayed  in  Figure  2. 


Sampled  Information  Fusion  Result  v(x) 


Fig.  2.  The  sampled  information  fusion  solution  v  and  a  corresponding 
Kernel-based  density  estimate  v.  The  sample  points  are  plotted  at  a  height  of 
0.1  for  clarity.  The  sampled  solution  v  (and  thus  the  corresponding  density 
estimation  v)  is  computed  using  only  the  sets  Ui,\/i  G  {1,2}  as  input. 

V.  Concluding  Remarks 

This  work  introduced  an  information  fusion  protocol  that 
delivers  a  probability  measure  that  is  the  ‘closest’  measure 
to  a  collection  of  given  input  measures  (in  the  sense  of 
the  weighted  Wasserstein  metric  on  the  space  of  probability 
measures). 

We  considered  the  explicit  computation  of  this  information 
fusion  result  for  two  important  (application  driven)  special 
cases.  Firstly,  we  examined  the  case  of  two  Gaussian  input 
measures  which  may  or  may  not  be  independent  and  we 
detailed  the  computation  of  the  fusion  result  and  explored 
the  consistency  of  this  result.  Secondly,  we  examined  the 
case  of  two  empirical  (randomly  sampled)  input  measures  and 
we  provided  an  information  fusion  computation  that  works 
directly  on  the  two  discrete  empirical  samples. 

In  the  Gaussian  case,  no  comparison,  either  through  sim¬ 
ulation  or  analysis,  with  covariance  intersection  or  log-linear 
opinion  pools  [5],  [15],  [1 6]  has  been  considered  and  such  work 
would  be  necessary  before  Wasserstein  information  fusion  was 
considered  applicable.  Comparison  with  other  related  work 
[1 7]-[21]  is  also  important  and  a  potential  topic  for  future  work. 

Similarly,  in  the  case  of  empirical  measures,  further  study 
and  analysis  is  required.  Also,  comparison  with  those  com¬ 
putationally  efficient  approximations  for  sampling  the  output 
of  the  product  of  input  measures  [12],  [13]  among  other 
approaches  is  needed. 
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